Goto

Collaborating Authors

 Oxford


Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Vu, Minh, Wan, Xiaoliang, Wei, Shuangqing

arXiv.org Machine Learning

The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.


VSCOUT: A Hybrid Variational Autoencoder Approach to Outlier Detection in High-Dimensional Retrospective Monitoring

Martinez, Waldyn G.

arXiv.org Machine Learning

Modern industrial and service processes generate high-dimensional, non-Gaussian, and contamination-prone data that challenge the foundational assumptions of classical Statistical Process Control (SPC). Heavy tails, multimodality, nonlinear dependencies, and sparse special-cause observations can distort baseline estimation, mask true anomalies, and prevent reliable identification of an in-control (IC) reference set. To address these challenges, we introduce VSCOUT, a distribution-free framework designed specifically for retrospective (Phase I) monitoring in high-dimensional settings. VSCOUT combines an Automatic Relevance Determination Variational Autoencoder (ARD-VAE) architecture with ensemble-based latent outlier filtering and changepoint detection. The ARD prior isolates the most informative latent dimensions, while the ensemble and changepoint filters identify pointwise and structural contamination within the determined latent space. A second-stage retraining step removes flagged observations and re-estimates the latent structure using only the retained inliers, mitigating masking and stabilizing the IC latent manifold. This two-stage refinement produces a clean and reliable IC baseline suitable for subsequent Phase II deployment. Extensive experiments across benchmark datasets demonstrate that VSCOUT achieves superior sensitivity to special-cause structure while maintaining controlled false alarms, outperforming classical SPC procedures, robust estimators, and modern machine-learning baselines. Its scalability, distributional flexibility, and resilience to complex contamination patterns position VSCOUT as a practical and effective method for retrospective modeling and anomaly detection in AI-enabled environments.


Provenance of AI-Generated Images: A Vector Similarity and Blockchain-based Approach

Sharma, Jitendra, Carvalho, Arthur, Bhunia, Suman

arXiv.org Artificial Intelligence

Rapid advancement in generative AI and large language models (LLMs) has enabled the generation of highly realistic and contextually relevant digital content. LLMs such as ChatGPT with DALL-E integration and Stable Diffusion techniques can produce images that are often indistinguishable from those created by humans, which poses challenges for digital content authentication. Verifying the integrity and origin of digital data to ensure it remains unaltered and genuine is crucial to maintaining trust and legality in digital media. In this paper, we propose an embedding-based AI image detection framework that utilizes image embeddings and a vector similarity to distinguish AI-generated images from real (human-created) ones. Our methodology is built on the hypothesis that AI-generated images demonstrate closer embedding proximity to other AI-generated content, while human-created images cluster similarly within their domain. To validate this hypothesis, we developed a system that processes a diverse dataset of AI and human-generated images through five benchmark embedding models. Extensive experimentation demonstrates the robustness of our approach, and our results confirm that moderate to high perturbations minimally impact the embedding signatures, with perturbed images maintaining close similarity matches to their original versions. Our solution provides a generalizable framework for AI-generated image detection that balances accuracy with computational efficiency.


AI in Pakistani Schools: Adoption, Usage, and Perceived Impact among Educators

Raza, Syed Hassan, Farooq, Azib

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is increasingly permeating classrooms worldwide, yet its adoption in schools of developing countries remains under-explored. This paper investigates AI adoption, usage patterns, and perceived impact in Pakistani K-12 schools based on a survey of 125 educators. The questionnaire covered educator's familiarity with AI, frequency and modes of use, and attitudes toward AI's benefits and challenges. Results reveal a generally positive disposition towards AI: over two-thirds of teachers expressed willingness to adopt AI tools given proper support and many have begun integrating AI for lesson planning and content creation. However, AI usage is uneven - while about one-third of respondents actively use AI tools frequently, others remain occasional users. Content generation emerged as the most common AI application, whereas AI-driven grading and feedback are rarely used. Teachers reported moderate improvements in student engagement and efficiency due to AI, but also voiced concerns about equitable access. These findings highlight both the enthusiasm for AI's potential in Pakistan's schools and the need for training and infrastructure to ensure inclusive and effective implementation.


Learning Decomposed Contextual Token Representations from Pretrained and Collaborative Signals for Generative Recommendation

Liu, Yifan, Liu, Yaokun, Li, Zelin, Yue, Zhenrui, Lee, Gyuseok, Yao, Ruichen, Zhang, Yang, Wang, Dong

arXiv.org Artificial Intelligence

Recent advances in generative recommenders adopt a two-stage paradigm: items are first tokenized into semantic IDs using a pretrained tokenizer, and then large language models (LLMs) are trained to generate the next item via sequence-to-sequence modeling. However, these two stages are optimized for different objectives: semantic reconstruction during tokenizer pretraining versus user interaction modeling during recommender training. This objective misalignment leads to two key limitations: (i) suboptimal static tokeniza-tion, where fixed token assignments fail to reflect diverse usage contexts; and (ii) discarded pretrained semantics, where pretrained knowledge--typically from language model em-beddings--is overwritten during recommender training on user interactions. To address these limitations, we propose to learn DE composed CO ntextual Token R epresentations (DECOR), a unified framework that preserves pretrained semantics while enhancing the adaptability of token embed-dings. DECOR introduces contextualized token composition to refine token embeddings based on user interaction context, and decomposed embedding fusion that integrates pretrained codebook embeddings with newly learned collaborative em-beddings. Experiments on three real-world datasets demonstrate that DECOR consistently outperforms state-of-the-art baselines in recommendation performance. Our code will be made available upon publication.


OmniAcc: Personalized Accessibility Assistant Using Generative AI

Karki, Siddhant, Han, Ethan, Mahmud, Nadim, Bhunia, Suman, Femiani, John, Raychoudhury, Vaskar

arXiv.org Artificial Intelligence

Individuals with ambulatory disabilities often encounter significant barriers when navigating urban environments due to the lack of accessible information and tools. This paper presents OmniAcc, an AI-powered interactive navigation system that utilizes GPT -4, satellite imagery, and OpenStreetMap data to identify, classify, and map wheelchair-accessible features such as ramps and crosswalks in the built environment. OmniAcc offers personalized route planning, real-time hands-free navigation, and instant query responses regarding physical accessibility. By using zero-shot learning and customized prompts, the system ensures precise detection of accessibility features, while supporting validation through structured workflows. This paper introduces OmniAcc and explores its potential to assist urban planners and mobility-aid users, demonstrated through a case study on crosswalk detection. With a crosswalk detection accuracy of 97.5%, OmniAcc highlights the transformative potential of AI in improving navigation and fostering more inclusive urban spaces.


SketchDNN: Joint Continuous-Discrete Diffusion for CAD Sketch Generation

Chereddy, Sathvik, Femiani, John

arXiv.org Artificial Intelligence

We present SketchDNN, a generative model for synthesizing CAD sketches that jointly models both continuous parameters and discrete class labels through a unified continuous-discrete diffusion process. Our core innovation is Gaussian-Softmax diffusion, where logits perturbed with Gaussian noise are projected onto the probability simplex via a softmax transformation, facilitating blended class labels for discrete variables. This formulation addresses 2 key challenges, namely, the heterogeneity of primitive parameterizations and the permutation invariance of primitives in CAD sketches. Our approach significantly improves generation quality, reducing Fréchet Inception Distance (FID) from 16.04 to 7.80 and negative log-likelihood (NLL) from 84.8 to 81.33, establishing a new state-of-the-art in CAD sketch generation on the SketchGraphs dataset.


The use of cross validation in the analysis of designed experiments

Weese, Maria L., Smucker, Byran J., Edwards, David J.

arXiv.org Machine Learning

Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned against using CV in their analysis. The striking increase in the use of machine learning, and thus CV, in the analysis of experimental designs, has led us to empirically study the effectiveness of CV compared to other methods of selecting models in designed experiments, including the little bootstrap. We consider both response surface settings where prediction is of primary interest, as well as screening where factor selection is most important. Overall, we provide evidence that the use of leave-one-out cross-validation (LOOCV) in the analysis of small, structured is often useful. More general $k$-fold CV may also be competitive but its performance is uneven.


Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications

Megahed, Fadel M., Chen, Ying-Ju, Jones-Farmer, L. Allision, Lee, Younghwa, Wang, Jiawei Brooke, Zwetsloot, Inez M.

arXiv.org Machine Learning

LLM-based annotation has become something of an academic Wild West: the lack of established practices and standards has led to concerns about the quality and validity of research. Researchers have warned that the ostensible simplicity of LLMs can be misleading, as they are prone to bias, misunderstandings, and unreliable results [1, p.1]. LLMs outperform typical human annotators. The evidence is consistent across different types of texts and time periods. It strongly suggests that ChatGPT may already be a superior approach compared to crowd annotations on platforms such as MTurk. At the very least, the findings demonstrate the importance of studying the text-annotation properties and capabilities of LLMs more in depth [2, p.2]. Together, these contrasting perspectives highlight the need to critically examine large language models (LLMs) for text annotation and classification. Although human annotation remains widespread, it poses considerable challenges. It is time-consuming and costly--up to $5 per annotation and $50 per hour for annotators [3]--and often suffers from inconsistencies stemming from the intricacies of language and the subjectivity of annotators [4].


TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans

Faruque, Md Omar, Jamieson, Peter, Patooghy, Ahmad, Badawy, Abdel-Hameed A.

arXiv.org Artificial Intelligence

Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, and formal verification methods suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction for HT detection by leveraging their natural language understanding and reasoning capabilities. For the first time, this paper explores the potential of general-purpose LLMs in detecting various HTs inserted in Register Transfer Level (RTL) designs, including SRAM, AES, and UART modules. We propose a novel tool for this goal that systematically assesses state-of-the-art LLMs (GPT-4o, Gemini 1.5 pro, and Llama 3.1) in detecting HTs without prior fine-tuning. To address potential training data bias, the tool implements perturbation techniques, i.e., variable name obfuscation, and design restructuring, that make the cases more sophisticated for the used LLMs. Our experimental evaluation demonstrates perfect detection rates by GPT-4o and Gemini 1.5 pro in baseline scenarios (100%/100% precision/recall), with both models achieving better trigger line coverage (TLC: 0.82-0.98) than payload line coverage (PLC: 0.32-0.46). Under code perturbation, while Gemini 1.5 pro maintains perfect detection performance (100%/100%), GPT-4o (100%/85.7%) and Llama 3.1 (66.7%/85.7%) show some degradation in detection rates, and all models experience decreased accuracy in localizing both triggers and payloads. This paper validates the potential of LLM approaches for hardware security applications, highlighting areas for future improvement.